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Abstract —Spatial wireless channel prediction is important for 
futnre wireless networks, and in particular for proactive resource 
allocation at different layers of the protocol stack. Various sources 
of uncertainty must be accounted for during modeling and to 
provide robust predictions. We investigate two channel prediction 
frameworks, classical Gaussian processes (cGP) and uncertain 
Gaussian processes (uGP), and analyze the impact of location 
uncertainty during leaming/training and prediction/testing, for 
scenarios where measurements uncertainty are dominated by 
large-scale fading. We observe that cGP generally fails both in 
terms of learning the channel parameters and in predicting the 
channel in the presence of location uncertainties. In contrast, 
uGP explicitly considers the location uncertainty. Using simulated 
data, we show that uGP is able to learn and predict the wireless 
channel. 

Index Terms —Gaussian processes, uncertain inputs, location 
uncertainty, spatial predictability of wireless channels. 

I. Introduction 

OCATION-based resource allocation schemes are ex¬ 
pected to become an essential element of emerging 
5G networks, as 5G devices will have the capability to 
accurately self-localize and predict relevant channel quality 
metrics (CQM) |[T|-|[3l based on crowd-sourced databases. 
The geo-tagged CQM (including, e.g., received signal strength, 
delay spread, and interference levels) from users enables the 
construction of a dynamic database, which in turn allows the 
prediction of CQM at arbitrary locations and future times. Cur¬ 
rent standards are already moving in this direction through the 
so-called minimization of drive test (MDT) feature in 3GPPP 
Release 10 0. In MDT, users collect radio measurements 
and associated location information in order to assess network 
performance. In terms of applications, prediction of spatial 
wireless channels (e.g., through radio environment maps) and 
its utilization in resource allocation can reduce overheads and 
delays due to the ability to predict channel quality beyond 
traditional time scales E). Exploitation of location-aware 
CQM is relevant for interference management in two-tier 
cellular networks 0, coverage hole detection and prediction 
0, cooperative spectrum sensing in cognitive radios fTlI . 
anticipatory networks for predictive resource allocation 0, 
and proactive caching El. 

In order to predict location-dependent radio propagation 
channels, we rely on mathematical models, in which the 
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physical environment, including the locations of transmitter 
and receiver, play an important role. The received signal 
power in a wireless channel is mainly affected by three major 
dynamics, which occur at different length scales; path-loss, 
shadowing, and small-scale fading 0. Small-scale fading 
decorrelates within tens of centimeters (depending on the 
carrier frequency), making it infeasible to predict based on 
location information. On the other hand, shadowing is cor¬ 
related up to tens of meters, depending on the propagation 
environment (e.g., 50-100 m for outdoor 0 and 1-2 m for in¬ 
door environments nni). Finally, path-loss, which captures the 
deterministic decay of power with distance, is a deterministic 
function of the distance to the transmitter. In rich scattering 
environments, the measurements average small-scale fading 
either in frequency or space provided sufficient bandwidth 
or number of antennas Ho). Thus, provided that measure¬ 
ments are dominated by large-scale fading, location-dependent 
models for path-loss and shadowing can be developed based 
on the physical properties of the wireless channel. With the 
help of spatial regression tools, these large-scale channel 
components can be predicted at other locations and used for 
resource allocation 0. However, since localization is subject 
to various error sources (e.g., the global positioning system 
(GPS) gives an accuracy of around 10 m IfTTI in outdoor 
scenarios, while ultra-wide band (UWB) systems can give sub¬ 
meter accuracy), there is a fundamental need to account for 
location uncertainties when developing spatial regression tools. 

Spatial regression tools generally comprise a train¬ 
ing/learning phase, in which the underlying channel param¬ 
eters are estimated based on the available training database, 
and a testing/prediction phase, in which predictions are made 
at test locations, given learned parameters and the training 
database. Among such tools, Gaussian processes (GP) is a 
powerful and commonly used regression framework, since it 
is generally considered to be the most flexible and provides 
prediction uncertainty information ifT^ . Two important limita¬ 
tions of GP are its computational complexity m-na and its 
sensitivity to uncertain inputs ifTTIl . lfT7ll - ll2T]| . To alleviate the 
computational complexity, various sparse GP techniques have 
been proposed in IfT^ - lfTSlI . while online and distributed GP 
were treated in ifT^ . f22\ . ||2^ and Il24l - ll26l . respectively. The 
impact of input uncertainty was studied in El, El, which 
showed that GP was adversely affected, both in training and 
testing, by input uncertainties. The input uncertainty in our 
case corresponds to location uncertainty. 

No framework has yet been developed to mathematically 
characterize and understand the spatial predictability of wire¬ 
less channels with location uncertainty. In this paper, we build 



on and adapt the framework from IflTl . IfTSl to CQM prediction 
in wireless networks. Our main contributions are as follows: 

• We show that not considering location uncertainty leads 
to poor learning of the channel parameters and poor 
prediction of CQM values at other locations, especially 
when location uncertainties are heterogeneous; 

• We relate and unify existing GP methods that account 
for uncertainty during both learning and prediction, by 
operating directly on an input set of distributions, rather 
than an input set of locations; 

• We describe and delimit proper choices for mean func¬ 
tions and covariance functions in this unified framework, 
so as to incorporate location uncertainty in both learning 
and prediction; and 

• We demonstrate the use of the proposed framework for 
simulated data and apply it to a spatial resource allocation 
application. 

The remainder of the paper is structured as follows. Sectionlllll 
presents the channel model and details the problem descrip¬ 
tion for location-dependent channel prediction with location 
uncertainty. In Section IIVI we review channel learning and 
prediction in the classical GP (cGP) setup with no localization 
errors. Section [V] details learning and prediction procedures 
using the proposed GP framework that accounts for uncertainty 
on training and test locations, termed uncertain GP (uGP). 
Finally, numerical results are given in Section [VJ in addition 
to a resource allocation example, followed by our conclusions 
in Section [yn] 

Notation: Vectors and matrices are written in bold (e.g., 
a vector k and a matrix K); denotes transpose of K; 
|K| denotes determinant of K; [K]^ denotes entry {i,j) of 
K; I denotes identity matrix of appropriate size; 1 and 0 are 
vectors of ones and zeros, respectively, of appropriate size; 
||.|| denotes L 2 -norm unless otherwise stated; E[.] denotes the 
expectation operator; Cov[.] denotes covariance operator (i.e., 
Cov[yi, y 2 ] = E[yiyJ] -E[yi] E[y 2 ]'^); A/'(x; m, S) denotes 
a Gaussian distribution evaluated in x with mean vector m 
and covariance matrix S and x ~ S) denotes that x is 

drawn from a Gaussian distribution with mean vector m and 
covariance matrix S. Important symbols used in the paper 
are: x^ £ is an exact, true location; D > 2 

is a vector that describes (e.g., in the form of moments) the 
location distribution p(xi). For example in the case of Gaus¬ 
sian distributed localization error, p(x) = AA(x; z, S), then a 
possible choice is u = [z"'’, vec[S]]"^, where vec[S] stacks all 
the elements of S in a vector. Finally, z^ = ^(ui) £ is 
a location estimate extracted from through a function ^(•) 
(e.g., the mean or mode). 

II. Related Work 

First, we give an overview of the literature on GP with un¬ 
certain inputs. One way to deal with the input noise is through 
linearizing the output around the mean of the input lfT9l . ll2Tll . 
In 1211, the input noise was viewed as extra output noise by lin¬ 
earization at each point and this is proportional to the squared 
gradient of the GP posterior mean. However, the proposed 
method works under the condition of constant-variance input 


noise. In lfT9l . a Delta method was used for linearization under 
the assumption of Gaussian distributed inputs and proposed 
a corrected covariance function that accounts for the input 
noise variance. For Gaussian distributed test inputs and known 
training inputs, the exact and approximate moments of the 
GP posterior was examined for various forms of covariance 
functions US). Training on Gaussian distributed input points 
by calculating the expected covariance matrix was studied in 
ifTTl . ifTSl . Two approximations were evaluated in ll27l . first a 
joint maximization of joint posterior on uncertain inputs and 
hyperparameters (leading to over-fitting), and second using 
a stochastic expectation-maximization algorithm (at a high 
computational cost). 

We now review previous works on GP for channel pre¬ 
diction, which include spatial correlation of shadowing in 
cellular 1^ and ad-hoc networks l29l, as well as tracking 
of transmit powers of primary users in a cognitive network 
Il2^ . In ED, GP was shown to model spatially correlated 
shadowing to predict shadowing and path-loss at any arbi¬ 
trary location. A multi-hop network scenario was considered 
ll29l . and shadowing was modeled using a spatial loss field, 
integrated along a line between transmitter and receiver. In 
||2^ . a cognitive network setting was evaluated, in which 
the transmit powers of the primary users were tracked with 
cooperation among the secondary users. For this purpose a 
distributed radio channel tracking framework using Kriged 
Kalman filter was developed with location information. A 
study on the impact of underlying channel parameters on the 
spatial channel prediction variance using GP was presented 
in ll^ . A common assumption in ll23l . Il28l - ll^ was the 
presence of perfect location information. This assumption was 
partially removed in ED, which extends to include the 
effect of localization errors on spatial channel prediction. It 
was found that channel prediction performance was degraded 
when location errors were present, in particular when either 
the shadowing standard deviation or the shadowing correlation 
were large. However, ED did not tackle combined learning 
and prediction under location uncertainty. The only work that 
explicitly accounts for location uncertainty was ll20l . in which 
the Laplace approximation was used to obtain a closed-form 
analytical solution for the posterior predictive distribution. 
However, ll20l did not consider learning of parameters in 
presence of location uncertainty. 

HI. System Model 

A. Channel Model 

Consider a geographical region A C R^, where a source 
node is located at the origin and transmits a signal with power 
Ptx to a receiver located at x^ C A through a wireless 
propagation channel. The received radio signal is affected 
mainly by distance-dependent path-loss, shadowing due to 
obstacles in the propagation medium, and small-scale fading 
due to multipath effects. The received power PRx(xi) can be 
expressed as Chap. 2] 

-Prx(x,) = Ptx50 IIxjH"’'V^( xQ |/i(x,)p, 


( 1 ) 


where go is a constant that captures antenna and other propa¬ 
gation gains, r] is the path-loss exponent, the location- 

dependent shadowing and is the small-scale fading. We 

assume measurements averagqj small-scale fading, either in 
time (measurements taken over a time window), frequency 
(measurements represent average power over a large frequency 
band), or space (measurements taken over multiple antennas) 
uni, ma. Therefore, the resulting received signal power from 
the source node to a receiver node i can be expressed in dB 
scale as 

-PRx(xi)[dBm] = Lo - IO 77 logio(||x4) -f ^'(xi), (2) 

where Lq = PTx[dBm] + Go with Go = 10 log 2 o(ffo) and 
'k(xi) = 10 log 2 o('*/'(xi))- A common choice for modeling 
shadowing in wireless systems is through a log-normal distri¬ 
bution, i.e., t['(xi) ^ A/"(0 ,ct^), where is the shadowing 
variance. Shadowing 'l'(xi) is spatially correlated, with well- 
established correlation models 041 . among which the Gud- 
mundson model is widely used 051 . Let yi be the scalaiQ 
observation of the received power at node i, which is written 
as yi = PRx(xi) + ni, where rii is a zero mean additive white 
Gaussian noise with variance cr^. For the sake of notational 
simplicity, we do not consider a three-dimensional layout, 
the impact of non-uniform antenna gain patterns, or distance- 
dependent path-loss exponents. 

B. Location Error Model 

In practice, nodes may not have access to their true location 
Xi, but only to a distribution p(xi j§. The distribution p(xi) is 
obtained from the positioning algorithm in the devices, and 
depends on the specific positioning technology (e.g., for GPS 
the distribution p(xi) can be modeled as a Gaussian). We will 
assume that all distributions p(xi) come from a given family of 
distributions (e.g., all bivariate Gaussian distributions). These 
distributions can be described by a finite set of parameters, 
Ui G D > 2, e.g., a mean and a covariance matrix 
for Gaussian distributions. The set of descriptions of all 
distributions from the given family is denoted by U C R^. 
Within this set, the set of all delta Dirac distributions over 
locations is denoted by X C IL. Note that X is equivalent 
to the set A of possible locations. Finally, we introduce a 
function (p : U ^ A that extracts a position estimate from 
the distribution (in our case chosen as the mean), and denote 
Zi = G A. We will generally make no distinction 

between a distribution p(xi) and its representation u^. 

C. Problem Statement 

We assume a central coordinator, which collects a set 
of received power measurements y = [yi,..., with 

respect to a common source from N nodes, along with their 
corresponding location distributions U = [uj^, uj,..., u^]^. 
Our goals are to perform 

*If measurements cannot average over small-scale fading, the proposed 
framework from this paper cannot be applied. 

^Vector measurements are also possible (e.g., from multiple base stations), 
but not considered here for the sake of clarity. 

^p(xi) is used for p{x = Xi) for notational simplicity. 



Figure 1. High-level comparison between cGP and uGP. The inputs to cGP 
during learning are observations Y and estimates Z of the (unobserved) actual 
locations X where those observations have been taken. Z is obtained through 
a positioning system. The true locations X are marked with a triangle and are 
generally different from the estimated locations Z, marked with a blue and 
red dot. During prediction, cGP predicts received power at an estimated test 
location, z*. In conttast, uGP considers the distribution of the locations X, 
described by U (and depicted by the red and blue circle), during learning. 
During prediction, uGP utilizes the distribution u* of the test location. Note 
that the amount of uncertainty (radius of the circle) can change. 

1) Learning: construct a spatial model (through estimating 
model parameters 6, to be defined later) of the received 
power based on the measurements; 

2) Prediction: determine the predictive distribution 
p(PRx(x*)|y, U, 0, X*) of the power in test locations 
X* and the distribution of the expectecfl received power, 
p(PRx(u*)|y,U,0,u,), for test location distributions 
u*. 

We will consider two methods for learning and prediction: 
classical GP (Section HVl i. which ignores location uncertainty 
and only considers z^ = (j){ui), and uncertain GP (Section 
lYli, which is a method that explicitly accounts for loca¬ 
tion uncertainty. We introduce X = [xj^, xj,..., x^]"^ and 
Z = [zf, zj,..., z^]"^ as the collection of true and estimated 
locations respectively. A high level comparison of cGP and 
uGP is shown in Fig. [T] where cGP operates on Z and Y, 
while uGP operates on U and Y. 

IV. Channel Prediction with Classical GP 

We first present cGP under the assumption that all locations 
during learning and prediction are known exactly, based on 
ina. Eg). Later in this section, we will discuss the impact 
of location uncertainties on cGP in learning/training and 
prediction/testing. 

A. cGP without Location Uncertainty 

We designate x^ S ^ as the input variable, and Prx(xj;) 
as the output variable. We model PRx(xi) as a GP with 
mean function p(x,;) : ^ R and a positive semidefinite 
covariance function G(xi,Xj) : ,4, x ,4, —)■ R’*', and we write 

-Prx(xj) - (/P(p(xi),G(xi,Xj)), (3) 

^Here, Prx(u*) should be interpreted as the expected received power, 
p(-PRx(u*)|y,U, 0,u*) = /p(PFix(x*)|y,U, e,x*)p(x*)dx*, where 
p(x*) is described by u* 












where QV stands for a Gaussian process. The mean func- 
tior0 is defined as /r(xi) = E^(xi)[-PRx(xi)] = Lq — 
lOp logj^pdlxill), due to (|2]i. The covariance function is 
defined as C{xi,Xj) = Cov[PRx(xi), Prx(xj)]. We will 
consider a class of covariance functions of the form: 

C{xi,Xj)= cr| exp , (4) 

where Sij = 1 for i = j and zero otherwise, p > 1, dc is 
the correlation distance of the shadowing, and cTproc captures 
any noise variance term that is not due to measurement noise 
(more on this later). Setting p = 1 in (|4]i, gives the exponential 
covariance function that is commonly used to describe the 
covariance properties of shadowing m, and p = 2, gives the 
squared exponential covariance function that will turn out to 
be useful in Section IV.CI Note that the mean and covariance 
depend on 

^ — [^n ; ^proc: : (5) 

which may not be known a priori. 

]) Learning: The objective during learning is to infer the 
model parameters 6 from observations y of the received power 
at N known locations X. The resulting training database is 
thus {X,y}. Due to the GP model, the joint distribution of 
the N training observations exhibits a Gaussian distribution 



Figure 2. Impact of location uncertainty for a one-dimensional example: 
the red curve depicts the received signal power Prx (x) a function of x 
(or equivalently, the distance to the base station), while the mai'kers show 
-PRx(xi) as a function of Zj = Training measurements are grouped 

into three regions: (+) con'esponds to high uncertainty, (•) con'esponds to 
low uncertainty, and (*) con'esponds to medium uncertainty, respectively. The 
location uncertainty results in output noise. 


this distribution turn out to be Ha 


p(y|X,0)=Af(y;/x(X),K), (6) 

where /r(X) = [p(xi), p(x 2 ),...,/r(xjv)]^ is the mean 
vector and K is the covariance matrix of the measured 
received powers, with entries [K.]ij = C{'Xi,Xj) + 6ij. The 
model parameters can be learned through maximum likelihood 
estimation, given the training database {X, y}, by minimizing 
the negative log-likelihood function with respect to 9: 

6 = argmin{- log(p(y|X, 0))}. (7) 

& 


The negative log-likelihood function is usually not convex and 
may contain multiple local optima. Additional details on the 
learning process are provided later. Once 6 is determined from 
{X, y}, the training process is complete. 

2) Prediction: After learning, we can determine the pre¬ 
dictive distribution of Prx(x*) at a new and arbitrary test 
location x*, given the training database {X, y} and 6. We 
first form the joint distribution 


y 

Prx(x*) 


-AT 


m(x) 

m(x*) 


K k* 


( 8 ) 


where k* is the x 1 vector of cross-covariances C'(x*,Xi) 
between the received power at x„ and at the training locations 
Xi, and fc** = C'(x*,x*) is the prior variance (i.e., the 
variance in the absence of measurements), given by C(x*, x*). 
Conditioning on the observations y, we obtain the Gaussian 
posterior distribution p(Prx(x*)|X, y, 0, x*) for the test lo¬ 
cation X*. The mean (Aix(x*)) and variance (Vrx(x*)) of 


^Other ways of including the mean function in the model are possible, such 
as to include it in the covariance structure, and transform the prior model to 
a zero-mean GP prior QU 


Prx(x*) +kjK ^(y-/r(X)) (9) 

N 

=M(x*) + X] (yj - C'(x*,Xj) 

i,j=l 

N 

=Ai(x*) +^/3iC'(x*,Xi). 

^Rx(X:,) - kj K~^ k:, (10) 

N 

- ^ ^ [K! C (X:4: , Xj ) C (X,^: , Xj ) , 

i,j=t 

where j3i = In (01, p(x*) cor¬ 

responds to the deterministic path-loss component at x*, 
which is corrected by a term involving the database and the 
correlation between the measurements at the training locations 
and the test location. In (fTOl i. we see that the prior variance 
fc** is reduced by a term that accounts for the correlation of 
nearby measurements. 

B. cGP with Location Uncertainty 

Now let us consider the case when the nodes do not have 
access to their true location x^, but only to a distribution p(xi), 
which is described by G U. Fig. 0 illustrates the impact of 
location uncertainties assuming Gaussian location errors for 
a one-dimensional example. The figure shows (in red) the 
true received power Prx(x) as a function of x as well as 
the measured power PRx(xi) as a function of = (j){ui) 
for a discrete number of values of u, shown as markers. To 
clearly illustrate the impact of different amounts on uncertainty 
on the position, we have artificially created three regions: 
high location uncertainty close to the transmitter, medium 












location uncertainty far away, and low location uncertainty for 
intermediate distances. When there is no location uncertainty 
(70 m until 140 m from the transmitter), Ri x^, so 
fRx(zi) ~ PRx(xi), and hence the black dots coincide 
with the red curve. For medium and high uncertainty, Zi can 
differ significantly from x^, so the data point with coordinates 
[zi,Pptx(xj;)] can lie far away from the red curve, especially 
for high location uncertainty (distances below 70 m). From 
Fig. 12] it is clear that the input uncertainty manifests itself 
as output noise, with a variance that grows with increasing 
location uncertainty^. This output noise must be accounted 
for in the model during learning and prediction. When these 
uncertainties are ignored, both learning and prediction will be 
of poor quality, as described below. 

1) Learning from uncertain training locations: In this case, 
the training database {Z,y} comprises locations z^ = 4>{ui) 
and power measurements yi = PRx(xi) + rii at the true (but 
unknown) locations x^. The measurements will be of the form 
shown in Fig. |2] The estimated model parameters 6 can take 
two forms: (i) assign very short correlation distances dc, large 
(Tip, and small (fproc, as some seemingly nearby events will 
appear uncorrelated: or (ii) assign larger correlation distances 
dc, smaller bip, and explain the measurements by assigning 
a higher value to (Tproc lED. In the first case, correlations 
between measurement cannot be exploited, so that during 
prediction, the posterior mean will be close to the prior mean 
and the posterior variance will be close to the prior variance. 
In the second case, predictions will be better, as correlations 
can be exploited to reduce the posterior variance. However, the 
model must explain different levels of input uncertainty with 
a single covariance function, which can make no distinctions 
between locations with low, medium, or high uncertainty. This 
will lead to poor performance when location error statistics 
differ from node to node. 

2) Prediction at an uncertain test location: In the case 
where training locations are exactly known (i.e., z^ = x^, Vi), 
we may want to predict the power at an uncertain test location 
u*, made available to cGP in the form z* = (/)(u*), while the 
true test location x* is not known. This scenario can occur 
when a mobile user relies on a low-quality localization system 
and reports an erroneous location estimate to the base station. 
The wrong location has impact on the predicted posterior 
distribution since the predicted mean /j,(z*) will differ from 
the correct mean p,(x,). In addition, k, will contain erroneous 
entries: the j-th entry will be too small when ||z* — Xj|| > 
||x» — Xjll and too large when ||z* — Xj|| < ||x* — Xj||. This 
will affect both the posterior mean (|9]l and variance (fTOl) . In the 
case were training locations are also unknown, i.e., Z X, 
and z* 7 ^ x*, these effects are further exacerbated by the 
improper learning of 0. 

V. Channel Prediction with Uncertain GP 

In the previous section, we have argued that cGP is unable 
to learn and predict properly when training or test locations are 

®ln fact, the output noise induced by location uncertainty will also depend 
on the slope of PRx(xi) around Xi, since a locally flat function will lead to 
less output noise than a steep function, under the same location uncertainty. 


not known exactly, especially when location error statistics are 
heterogeneous. In this section, we explore several possibilities 
to explicitly incorporate location uncertainty. We recall that 
U denotes the set of all distributions over the locations in 
the environment A, while X <ZU represents the delta Dirac 
distributions over the positions and has a one-to-one mapping 
to A. 

We will describe three approaches. First, a Bayesian ap¬ 
proach where the uncertain input (i.e., the uncertain location) 
is marginalized, leading to a non-Gaussian output (i.e., the 
received power) distribution. Second, we derive a Gaussian 
approximation of the output distribution through moment 
matching and detail the corresponding learning and prediction 
expressions. From these expressions, the concepts of expected 
mean function and expected covariance function naturally 
appear. Finally, we discuss uncertain GP, which is a Gaussian 
process with input u from input set U and output y. We 
will relate these three approaches in a unified view. For 
each approach, we detail the quality of the solution and 
the computational complexity. We note that other approaches 
exist, e.g., through linearizing the output around the mean of 
the input lfT9ll . IItTI . but they are limited to mildly non-linear 
scenarios. 

A. Bayesian Approach 

In a Bayesian context, we learn and predict by integrating 
the respective distributions over the uncertainty of the training 
and test locations. As this method will involve Monte Carlo 
integration, we will refer to it as Monte Carlo GP (MCGP). 

1) Learning: Given the training database {U, y}, the like¬ 
lihood function with uncertain training locations p(y|U, 9) is 
obtained by integrating p(y|X, 0) over the random training 
locations: 

p(y|U, 9) = j p(y|X, 0) p{X) dX, (11) 

where p(X) = ntiP(^i)- there is generally no closed- 
form expression for the integral (fTTI) . we resort to a Monte 
Carlo approach by drawing M i.i.d. samples X*^™^ ^ 

1 < TO < M so that 

M 

p(y|U,0)R.-^p(y|XM,0) 

m—1 

M 

= ( 12 ) 
m—1 

where and /r(X(™)) = 

[/f(xl"“^),/i(x 2 ™l),..., Finally, an estimate of 9 

can be found by minimizing the negative log-likelihood func¬ 
tion 

9 = argmin{- log(p(y|U, 0))}, (13) 

0 

which has to be solved numerically. 

^For the sake of notation, all integrals in this section are written as 
indefinite integrals, however they should be understood as definite integrals 
over appropriate sets. 


Remark 1. This optimization involves high computational 
complexity and possibly numerical instability (due to the sum 
of exponentials). More importantly, a good estimate of 9 can 
only be found if a sample is generated that is close 

to the true locations X. Due to the high dimensionality llJTl 
Section 29.2], this is unlikely, even for large M. Hence, (fTsT l 
will lead to poor estimates of 9. 

2) Prediction: Given the training database {U,y} and 
9, we wish to determine p(Prx(u»)|U, y, 0, u,) for 
an uncertain test location with associated distribution 
p(x,), described by u*. The posterior predictive distri¬ 
bution p(Prx(u*}|U, y, 0, u*) is obtained by integrating 
p(Prx(x*)|X, y, 0, X*) with respect to X and x*: 

p(PRx(u*)|U,y,0,u*) 

= yp(PRx(x*)|X,y,0,x*)p(X)p(x*)dXdx*. (14) 

This integral is again analytically intractable. The Laplace 
approximation was utilized in ll20l to solve (O, while here 
we again resort to a Monte Carlo method by drawing M 
i.i.d. samples ^ p(X) and xl™^ ~p(x*), so that 

p(PRx(u*)|U,y, 0,u*) 

M 

E^'(^Rx(x«)|xW,y,0,xM) 

m—1 

M 

= E-^(^Rx(x^));Prx(x«),Lrx(x«)). (15) 

m—\ 

As M increases, the approximate distribution will tend to the 
true distribution. We refer to (fOT l and (fTSl l as Monte Carlo GP 
(MCGP). From (flST l. we can compute the mean (P^^(u*)) 
and the variance (Vp^^(u*)) ll^ Eq. (14.10) and Eq. (14.11)] 
as 

M 

^rx°(u*) = ^E^Rx(x1™^) (16) 

m—1 

M 

E (^Rx(xl™^) - Prx®(u.)) 

m—1 

M 

+ mI1 ^Rx(x1™^ (17) 

m—1 

Remark 2. Prediction is numerically straightforward, though 
it involves the inversion of an x matrix K for each 
of the M samples Xl™l. In the case training locations are 
known, we can utilize cGP to obtain a good estimate of 9 and 
efficiently and accurately compute P]^(u*) and kj^^(u*). 
When both training and test locations are known, the above 
procedure reverts to cGP 

B. Gaussian Approximation 

We have seen that while MCGP can account for location 
uncertainty during prediction, it will fail to deliver adequate 
estimates of 9 during learning (see Remark[T]i. To address this, 
we can modify p(y|U, 9) from (fTTIi using a Gaussian approx¬ 
imation through moment matching. In addition, we can also 


form a Gaussian approximation of p(Prx(u*)|U, y, 0, u*) 
for prediction. We will term this approach Gaussian ap¬ 
proximation GP (GAGP). The expressions that are obtained 
in the learning of GAGP, namely the expectation of mean 
and covariance functions will be used later in the design of 
uncertain GP (described in Section IV.Cb . 

1) Learning: Given the training database {U, y}, the mean 
of p(y|U, 9) is given by 

E[y|U,0] = JJ yp{y\X,9)piX)dXdy 
= JJ{yp{y\X,9)dy)p{X) dX 
= J /r(X)p(X) dX 

= M(U), (18) 

where /r(U) = [/z(ui),/x(u 2 ),..., p(uAr)]'^ and p.{ui) = 
f fj.(xi)p(xi) dxi. The covariance matrix of p(y|U, 0) can 
be expressed as 

Cov[y,y|U,0] 

= Jyy^p{y\X,9)p{X)dXdy-p{lJ)p{Vf 

= y (K 4- /x(X)/r(X)T) p{X) dX - m(U)m(U)T 
= K, + A, (19) 

where [Kuj^ = C'u(ui, u^ ) -|- Sij in which 

Cu{ui,Uj)= J C{xi,xj)p{xi)p{xj)dxidxj (20) 

and A is a diagonal matrix with entries 

[Ajii = y ^^(x,;)p(xi)dxi - p^(ui). (21) 

We will refer to p{ui) and C'u(ui,Uj) as the expected mean 
and expected covariance function. We can now express the 
likelihood function as p(y|U, 0)RiA/"(y;/r(U), Ku + A), so 
that 9 can be estimated by minimizing the negative log- 
likelihood function 

9 = argmm|-log(A/'(y;/r(U),Kii -f A))|. (22) 

Remark 3. Learning in GAGP involves computation of the 
expected mean in (fTsT l and (ISTT i. as well as the expected 
covariance function in (l20l i. These integrals are generally again 
intractable, but there are cases where closed-form expression 
exist El, El- These will be discussed in detail in Section 
IV.CI GAGP avoids the numerical problems present in MCGP 
and will hence generally be able to provide a good estimate 
of 9. 

2) Prediction: Given the training database {U,y} 
and 9, we approximate the predictive distribution 
p(fRx(u*)|U,y,0,u*) by a Gaussian with mean 






and variance ^i§^(u*). These are given by 

= E[PRx(u*)|U,y,0,u*] 

= y-PRx(x*)p(X)p(x*)dXdx* 

N 

= K^*)+Y 1 

i=l 

Note that Pi is itself a function of all X’s and x*. Similarly 
yj0^(u*) is calculated as 

(u*) 

= E[F2^(u,)|U,y,0,u,] (24) 

= y (^Rx(x*) + -Prx(x*)^) p(X)p(x*) dXdx* 
-^rxK)"- (25) 


y/3i (^(x*, x,;)p(X)p(x*) dXdx*. (23) 


Note that both Prx(x*) and 14 rx(x*) are functions of X (see 

©-([loll). 

Remark 4. Prediction in GAGP requires complex integrals 
to be solved in (|2^-(|25]| for which no general closed-form 
expressions are known. Hence, a reasonable approach is to 
use GAGP to learn 9 and MCGP for prediction. 

Remark 5. In case training locations are known, i.e., U G A", 
(l23]l reverts to 


^RX (u*) = /^(u*) + y C'(x*,Xi)p(x*)dx* (26) 

i—1 

and (l25l l becomes 

^ _ f 

= A:** - ^ [K“^]y / C'(x*,Xi) C'(x*,xj)p(x*) dx* 


ij'=l 


+ y/r(x*)^p(x*) dx* + 2^/3i^y p(x*) C(x*,Xi) 

2 = 1 
N 

xp(x*)dx*j + ^ PiPj / C'(x*,Xi) C'(x*,Xj)p(x*) dx* 


ij'=l 




(27) 


both of which can be computed in closed form, under some 
conditions, when /i(x) is constant in x ifTSl Section 3.4]. When 
both U G A and u* G A, GAGP reverts to cGP 


C. Uncertain GP 

While GAGP avoids the learning problems inherent to 
MCGP, prediction is generally intractable. Hence, GAGP is 
not a fully coherent approach to deal with location uncertainty. 
To address this, we consider a new type of GP (uGP), which 
operates directly on the location distributions, rather than 
on the locations. uGP involves a mean function /iuGp(ui) : 
A/ —K. and a positive semidefinite covariance function 


C'uGp(ugUj) '■ U X U ^ M+, which considers as inputs 
u G A/ and as outputs y G K. In other words, 

^’Rx(Ui) (7'A’(MuGp(Ui), CnGpCw, Uj)). (28) 

The mean function is given by yLtuGp(ui) = 
ExJE^(x;)[PRx(xi)]], already introduced as the expected 
mean function in (fTsT l. However, for the mean function to 
be useful in a GP context, it should be available in closed 
form. As in cGP, we have significant freedom in our choice 
of covariance function. Apart from all technical conditions 
on the covariance function as described in IfT^ . it is desirable 
to have a covariance function that (i) is available in closed 
form; (ii) leads to decreasing correlation with increasing input 
uncertainty (even when both inputs have same mean); (iii) 
can account for varying amounts of input uncertainty; (iv) 
reverts to a covariance function of the form (HJi when u G A, 
(v) does not depend on the mean function /i(x). We will 
now describe the mean function /ruGp(ui) and covariance 
function CuGp(uGUj) in detail. 

The mean function: According to law of iterated expecta¬ 
tions, the mean function p.{ui) is expressed as 

/i(ui) = Lo- lOpExJlogiodlxjll)]. (29) 

While there is no closed-form expression available for ( |29] |, 
we can form a polynomial approximation ~ 

logio(||xd|), where the coefficients aj are found by least 
squares minimization. For a given range of ||xi||, this approxi¬ 
mation can be made arbitrarily close by increasing the order J. 
When p(||xd|) is approximately Gaussian (which may be the 
case for ||xi|| > 0), p{ui) Ki Lq-10 1 ] ®xj||xi|p] 

can be evaluated in closed form, since all Gaussian moments 
are known. See Appendix lAl for details on the approximation. 

The covariance function: While any covariance function 
meeting the criteria (i)-(v) listed above can be chosen, a 
natural choice is (see Section II V. Al l 


C'uGp(Ui,Uj) = Cov[PRx(Xi),PRx(Xj)|Ui,Uj] 

= Cov[t/i, yflJ, 9] - (30) 


Unfortunately, as we can see from (O, this choice does not 
satisfy criterion (v). An alternative choice is the expected 
covariance function C'u(ui, Uj) from (l20l i. This choice clearly 
satisfies criteria (ii), (iii), (iv), and (v). To satisfy (i), we 
can select appropriate covariance functions, tailored to the 
distributions p(xi), or appropriate distributions p(xi) for a 
given covariance function. Examples include: 

• Polynomial covariance functions for Gaussian p(xi) ifTTl . 

Qa. 

• Covariance functions of the form (IDi with p = 1, x^ G R, 
for Laplacian p(xi). 

• Covariance functions of the form (|4]i with p = 2, x^ G R^, 
for Gaussian p(x,;) (i.e., p(xi) = A/"(xi; z^, S^)). The 
expected covariance function is then given by El, El 


) — ^ij ^proc 




I + C + 


2 

pr 
- 1/2 


X exp - z,)T(I + -f S,))-i( 


(31) 







Note that the factor |I + + Sj)(l — 

ensures that inputs i ^ j with the same mean (i.e., 
Zi = Zj) exhibit lower correlation with increasing un¬ 
certainty. The factor (I + ensures that 

the measurements taken at locations with low uncertainty 
(smaller than dc) can be explained by a large value of 
dc, while for measurements taken at locations with high 
uncertainty, Cu(ui, Uj) will be small and decreasing with 
increasing uncertainty. 

1) Learning: Given the training database {U,y} and 
choosing HuGp{ut) = p(uj) and CuGp(ui, Uj) = Cu(u„ Uj), 
the model parameters are found by minimizing the log- 
likelihood function 

0 = argmin{- log(p(y|U, 0)} 

= argmin{- log(A/'(y; /r(U), Ku)}. (32) 

Note that in contrast to GAGP, we have constructed uGP 
so that /i(U) and Ku are available in closed form, making 
numerical minimization tractable. 

Remark 6. Learning of uGP (l32l i corresponds to the case of 
learning (l22li in GAGP for A = 0 (e.g., for constant mean 
processes). 

2) Prediction: Let Aix(u*) be the mean and Vr,x(u*) 

be the variance of the posterior predictive distribution 
p(PRx(u,)|U,y,0,u,) of uGP with uncertain training 
and test locations, then p(Prx(u*)|U, y, 0, u*) = 

-Y(-Prx(u*); Aix(u*),yRx(u*)). The expressions for 
Prx(u*) and Vrx(u*) are now in standard GP form: 

^Rx(u*) = -L (y - /r(U)) (33) 

Lrx(u*) = ku** kjj^ Kjj ku*, (34) 

where ku* is the A x 1 vector of cross-covariances C'u(u*, u^) 
between the received power at the test distribution u* and at 
the training distribution u^, and fcu** is the a priori variance 

C'u(u*, u*). 

Remark 1. In case the training locations are known, i.e., 
U S A, the mean Prx(u*) and the variance Vrx(u*) can be 
obtained from the expressions ( l33l l and (l34l i. respectively, by 
setting Si = 0, Vi G {1, 2,..., N }. Furthermore, the resulting 
mean Aix(u*) is exactly the same as (l26l l. obtained in GAGP. 
However, due to a different choice of covariance function, the 
predicted variance Vrx(u*) is different from (l27l i. 

Remark 8. When the test location is known, i.e., u* G X, the 
mean Prx(x*) and the variance Vrx(x*) are obtained from 
(1^ and (1^ by setting S* = 0. 

D. Unified View 

We are now ready to recap the main differences between 
cGP and uGP, and to provide a unified view of the four 
methods (cGP, MCGP, GAGP, and uGP). Fig. [3] describes the 
main processes in uGP and cGP, along with the inputs and 
outputs during the learning and prediction processes. The four 
methods are depicted in Fig.|4l all four methods revert to cGP 
when training and predictions occur in X, i.e., when there is 
no uncertainty about the locations. MCGP is able to consider 



Figure 3. Learning and prediction phases of cGP and uGP. The difference in 
learning in uGP compared to cGP is that it considers location uncertainty 
of the nodes. The estimated model parameters 0 are derived during the 
learning phase and are generally different in cGP compared to uGP. The 
mean Prx(z*) and variance Vrx(z*) of the posterior predictive distribution 
in cGP con'esponds to a location z* extracted from u*, which in turn 
represents p(x*). In contrast, the mean Prx(u*) and variance Vrx(u*) 
of the posterior predictive distribution in uGP pertains to the entire location 
distribution represented by u*. 


all output dist. 



Figure 4. Relation between cGP, MCGP, GAGP, and uGP. All methods are 
equivalent when the input is limited to X (grey shaded area). 


general input distributions in U, but leads to non-Gaussian 
output distributions. Through a Gaussian approximation of 
these output distributions, GAGP can consider general inputs 
and directly determine a Gaussian output distribution. Both 
of these approaches (MCGP and GAGP) have in common 
that they treat the process with input x G Vl as a GP. In 
contrast, uGP treats the process with input u G A as a GP. 
This allows for a direct mapping from inputs in U to Gaussian 
output distributions. In terms of tractability for learning and 
prediction, the four methods are compared in Table ID We see 
that among all four methods, uGP combines tractability with 
good performance. 

VI. Numerical Results and Discussion 

In this section, we show learning and prediction results 
of cGP, uGP, and MCGP with uncertainty in training or test 





























Table I 

Comparison of tractability for cGP, MCGP, GAGP, and uGP in 

LEARNING AND PREDICTION. 


Method 

Learning 

Prediction 

cGP 

tractable, poor quality 

closed-foiTn, poor quality 

MCGP 

complex, poor quality 

tractable 

GAGP 

tractable in some cases 

intractable 

uGP 

tractable by design 

closed-form 


locations. In Section IVI.DI we describe a resource allocation 
problem, where communication rates are predicted at future 
locations using cGP and uGP, in the presence of location 
uncertainty during training. The numerical analysis carried 
in this section is based on simulated channel measurements 
according to the model outlined in Section [HI] 


Table II 

Simulation Parameters 


Parameter 

Value 

Parameter 

Value 

V 

2.5 

M 

300 


0.01 

Lq 

-10 dBm 

dc 

15 m 

cr^ 

10 dB 


A. Simulation Setup 

A geographical region A is considered and a base station 
is placed at the origin. A one dimensional radio propaga¬ 
tion field is generated with sampling locations at a reso¬ 
lution of 0.25 m using an exponential covariance function 
Cret(xi,Xj) = (T|exp^—||xi — Xjll/dc^, corresponding to 
the Gudmundson model. Small-scale fading is assumed to 
have been averaged ouH The simulation parameters used to 
obtain the numerical results are given in Table |II] We assume 
isotropic localization errors, so that = af I. To capture the 
effect of heterogeneous location errors, we draw the location 
error standard deviations from an exponential distribution, 
i.e., Ui Exp(A), where A is the average location error 
standard deviation. For cGP and MCGP, in order to not provide 
any unfair advantage to uGP, we use a covariance function 
of the form © with p = 1, in order to match the true 
covariance function Cret(xi, Xj). For uGP, we use OTl i. Since 
uGP exhibits a mismatch in the covariance function, we absorb 
this mismatch in ciproc, which is learned offline (more on this 
in Appendix iBj. We assume nodes know an and Lq, which 
be inferred using standard methods ca, ED, iOl, so they 
are not included in the learning process. 

B. Learning Under Location Uncertainty 

Fig. |5] depicts the impact of location uncertainty on the 
learning of hyperparameters [dc o-proc, 17 ] for cGP, uGP, 
and MCGP. The learning of the hyperparameters is detailed in 
Appendix |B] 

*In the case small-scale fading is not averaged out, the proposed framework 
cannot be applied. 


1) cGP: We first consider a variant of cGP, denoted as 
cGP-no-proc, in which Uproc is fixed to zero. In cGP-no-proc, 
when A = 0, the estimate dc is non-zero. However, it can be 
observed in Fig. |5] (a), that with increase in A, dc decreases 
quickly to zero. Hence, cGP-no-proc will model the GP as a 
white process with high variance (t|, and thus cannot handle 
the location uncertainty. On the other hand, in cGP where we 
estimate cTproc, dproc absorbs part of location uncertainty (see 
Fig- 13(c)). Consequently, the part of the observations that must 
be explained through cr^ is reduced, leading to a reduction 
of (Tip with A. Due to this, cGP considers the measurements 
constitute a slowly varying process, therefore dc increases with 
A. An interesting observation is that the error bars for dc also 
increase with A. Hence, among cGP-no-proc and cGP, only 
cGP can reasonably deal with location uncertainty. 

2) MCGP: The behavior is similar to that of cGP, i.e., 
an increase in dc, and a decrease in when increasing A. 
However, decreases more quickly with A when compared 
to cGP. These effects can be attributed to two causes; first 
of all, the inherent problem of drawing a finite number of 
samples as detailed at the end of Section IV.All secondly, 
the fluctuations in the estimated path loss exponent rj with 
increasing A (see Fig. |5] (d)). The error bars of the estimates 
in this case are even higher than in cGP. As expected, MCGP 
is not suitable for learning. 

3) uGP: As mentioned before, in uGP Uproc is determined 
offline. The uGP model has the capability to absorb the 
location uncertainty into the covariance function. Due to this 
flexibility, it can handle higher values of A and still maintain 
an almost constant dc and tf^ with increase in A. For fair 
comparison with cGP, we also consider the case where Uproc 
is estimated as part of the learning, referred to as uGP-proc. 
It can be observed in Fig. |5] (c) that (fproc increases with 
increase in A. When comparing uGP-proc to uGP, we observe 
a lower value of and higher values of dc and (Tproc for a 
particular value of A. From this, we conclude that uGP should 
be preferred over uGP-proc, as it can explain the observations 
with smaller (Tproc and leads to simpler optimization. Finally, 
note that the error bars of the uGP estimates are relatively 
small when compared to cGP 

C. Prediction Under Location Uncertainty 

Four cases can be considered, depending on whether train¬ 
ing or testing inputs are in X or U. We will focus on the case 
where either training or test locations are uncertain, but not 
both. From these, the behavior when both training and testing 
inputs are in U can be easily understood: only uGP can give 
reasonable performance among cGP, MCGP, and uGP, as the 
estimates of 0 in cGP and MCGP are of poor quality. 

1) Uncertain training locations and certain testing loca¬ 
tions: In this case Ui & U and u* g X. Fig. |6] (a) depicts 
the prediction results in terms of the predictive mean and 
predictive standard deviation (shown as shaded areas) for a 
particular realization of the channel field. It can be observed 
that uGP is able to predict the received power comparatively 
better than cGP and MCGP. uGP is able to estimate the under¬ 
lying channel parameters better with the expected covariance 
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Figure 5. Impact of location uncertainty on leai'ning the hyperparameters using cGP, uGP, and MCGP. The hyperparameters are estimated for each value of 
the mean location error standard deviation and for 40 realizations of the channel field. Results shown are the mean estimate of the hyperparameters and error 
bars with one standard deviation. Impact of location uncertainty in shown when estimating: (a) dc, (b) (c) cjproc, (d) rj. 


function, which takes in to account the location uncertainty of 
the nodes. In turn, this means that uGP can track the faster 
variations in the channel. cGP tries to model the true function 
with a slow varying process due to very high dc- Furthermore, 
cGP has higher uncertainty in predictions due to high ifproc 
(see Fig. |5] (c)). On the other hand, MCGP has slightly better 
prediction performance (the standard deviation is not shown, 
but is slightly smaller than for cGP) compared to cGP due 
to the averaging by drawing samples from the distribution of 
the uncertain training locations. Averaging the prediction error 
over multiple channel realizations. Fig. |6] (b) shows the mean 
squared error (MSE) of the received power prediction of cGP 
and uGP with respect to A (MCGP is not shown due to its 
similar performance to cGP). uGP clearly outperforms cGP 
(except fo A = 0) due to its better tracking of the true channel 
(see Fig. |6] (a)) despite uncertainty on the training locations. 
The reason for higher MSE in the case of A = 0 for uGP is 
due to its kernel mismatch. 


2) Certain training locations and uncertain testing loca¬ 
tions: In this case vn d X and u* € W (with a constant 
location error standard deviation cr m). Now the perfor¬ 
mance must be assessed with respect to the expected received 
power FRx,avg(u,) = / Prx(x*) p(x*) dx*, where p(x*) = 
A/^(z*, cr^ I), in which z* is the mean of distribution described 
by u». An example is shown in Eig.|2](a), depicting Pux.avg as 
a function of z*, as well as the predictions from cGP, MCGP, 
and uGP. It can be observed that uGP and MCGP follow well 
-pRX.avg- Specifically, MCGP tracks pRx.avg quite closely as 
it is near-optimal in this case. In contrast, cGP follows the 
actual received power at z*, rather than the averaged power. 
This leads to fast variations in cGP, which are not present in 
uGP and MCGP. Pig. [T] (b) shows the MSE of the received 
power prediction of cGP, MCGP, and uGP with respect to 
(T when averaging the prediction error over multiple channel 
realizations. As expected, MCGP has the lower MSE than 
uGP and cGP However, uGP performs better than cGP for 
all considered cr, except cr = 0 (due to kernel mismatch). 

























































Figure 6. Performance comparison of cGP, MCGP, and uGP under uncertain training and certain testing locations. Inset (a) received power prediction using 
uncertain training locations with average location error of A = 8 m and certain test locations for single realization of a channel field. The shaded area (grey 
for cGP and blue for uGP) depicts point wise predictive mean plus and minus the predictive standard deviation, and (b) MSE performance of cGP and uGP 
as a function of average location eiTor standai'd deviation A. The MSE is averaged for each value of A and for 50 realizations of the channel field is shown 
are the mean of the MSE and error bars with one standard deviation. The MSE is calculated as -p^ “ Aix(x*))^, where 'T is the set 

of test locations and \T\ denotes its cardinality. 




Location error standard deviation a in m 
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Figure 7. Performance comparison of cGP, MCGP, and uGP under certain training and uncertain testing locations. Inset (a) received power prediction using 
certain training and uncertain test locations with a constant location error standard deviation cj = 5 m for single realization of channel field, and (b) MSE 
performance of cGP, MCGP and uGP as a function of constant location eiTor standard deviation a on test locations. The MSE is averaged for each value 
of cr and for 50 realizations of the channel field is shown are the mean of the MSE and eiror bars with one standaid deviation. The MSE is calculated as 
Yf^\ (^RX,avg(u*) — -Prx(u*))^, where is the set of test location distributions and |T^| denotes its cardinality. 


Furthermore, the performance of uGP is very close to that of 
MCGP. 

D. Resource Allocation Example 

1) Scenario: In this section, we compare cGP and uGP for 
a simple proactive resource allocation scenario. We consider a 
user moving through a region A and predict the CQM at each 
location. The supported rate, expressed in bits per channel use 
(bpu), for a user at location is defined as 

r(x*) = log 2 (l + SNR(x*)), (35) 

where SNR(x„) = is the signal-to-noise ratio 

at location x*, W*'" is the receiver thermal noise and P^x(x*) 


is the received power, both measured in linear scale. The 
average rate in the region .4, denoted as is defined as 

where |.4| denotes area of the region A. The predicted rate for 
a user at a future location x*, based on the predicted CQM 
values (Prx(x*), Vrx(x*)), is defined as 

r(x*,Q;) = log 2 (l+ SNR(x*,a)), (37) 

where a > 0 is a confidence parameter, 

SNR(x*,a) = P^^(x*,a)/lV''" and PRx(x*,a) = 

10 logio(P]!^5^(x*,a)) = pRx{^*) - a (Vrx(x*)) " ■ 




















Figure 8. Resource allocation example for cGP, and uGP with two different values of localization error standard deviations (A G {0,10} m) and for different 
values of the confidence parameter a. The results are averaged for each value of A with 50 channel realizations. Inset (a) the effective rate f^(ci!), and (b) 
the fraction of undelivered bits U{a). 


2) Performance measure: The user moves through the 
environment according to a known trajectory. The base station 
allocates bits to each future location, proportional to r(x*, a). 
When the user is at location x*, only a fraction of the 
bits, proportional to min(r(x*, a), r(x*)) would be delivered. 
Therefore, the effective rate r®®(x*, a) for the user at location 
X* is 


r®®(x*,a) = min(r(x*,a),r(x*)). (38) 


The average effective rate (a) for a given conhdence level 
a is then computed by spatial average of r®®(x*,Q;) over 
region A as 

]X\ ^ 


When r(x*,a) > r(x*), a part of the allocated bits cannot 
be delivered. The total fraction of undelivered bits over the 
environment is given by 


U{a) = 


(r(x*,a) - r®®(x*,a)) dx* 
r(x*,a) dx* 


e[o,i). (40) 


Hence, ^^{a) describes the rate that the user will receive 
(penalizing under-estimation of the rate), while U(a) describes 
the loss due to lost bits (penalizing over-estimating of the rate). 

3) Predicted communication rates with uncertain training 
locations: We predict the CQM at known test locations x* S 
X, based on training with uncertain locations (considering 
A G {0,10} m), all within a one-dimensional region A. The 
average effective rate ?^(a) and the fraction of undelivered 
bits U{a), as a function of a, are shown in Fig [8] (a)- 
(b), respectively. As expected, increasing a leads to a more 
conservative allocation, thus reducing both r^(a) and [/(a). 
For a specihc value of a, increase in A decreases f^(a). This 
is due to the fact that with increase in A, the mean Pjix(^*) 
is of poor quality and the variance Vrx (x* ) is high for CQM 
predictions. 


It is evident that when A = 0, uGP and cGP attain similar 
performance, both in terms of f^^{a) and U{a). When A is 
increased to 10 m, cGP suffers from a signihcant reduction in 
effective rate f^(a), while at the same time dropping up to 
4.5 % of the bits. This is due to cGP’s poor predictions, which 
are either too low (leading to a reduction in f}^(Qf)) or too 
high (leading to an increase in U (a)). In contrast, uGP, which 
is able to track the channel well despite uncertain training, 
achieves a higher effective rate, especially for high conhdence 
values (e.g., around 2 times higher rate for a = 3, for U{a) 
less than 0.1%). 


VII. Conclusion 

Channel quality metrics can be predicted using spatial 
regression tools such as Gaussian processes (GP). We have 
studied the impact of location uncertainties on GP and have 
demonstrated that, when heterogeneous location uncertainties 
are present, the classical GP framework is unable to (i) 
learn the underlying channel parameters properly; (ii) predict 
the expected channel quality metric. By introducing a GP 
that operates directly on the location distribution, we hnd 
uncertain GP (uGP), which is able to both learn and predict 
in the presence of location uncertainties. This translates in 
better performance when using uGP for predictive resource 
allocation. 

Possible avenues of future research include validation using 
real measurements, modeling correlation of shadowing in 
the temporal dimension, study of better approximations for 
learning with uncertain locations, and the extension to ad-hoc 
networks. 


Appendix A 

Approximation of Expected Mean Function 

Let di = ||xi|| and recall from random variable transforma¬ 
tion theory that 


loglo(l|x*||)p(x,;)dXi 


logio(di)p(di)ddi. (41) 






We assume p(xi) 
distribution 

pidi) = ^ exp 


= JV{zi,afT), so p{di) follows a Rician 



^r(\\^^\\d^\ 

V 2 a? . 



where Iq{.) is a modified Bessel function of zero-th order. 
For ||z,||/ctj > 3, p{di) can be approximated as a Gaussian 
distribution 


PGauss(di) = exp^—. (43) 

V2 7rcr/ V 2cr- / 

The integral (l4Tl i still does not have a closed form expression 
with PGauss{di). Now approximating the log]^Q(.) function with 
a polynomial function of the form w{di) = X]/=o 
(ED can be written as 


logio(l|x*||)p(x0dxj 


n-\-oc 


w{di)pG auss (di) ddi, 


(44) 


which can be computed exactly. 


Appendix B 
Learning Procedure 

In this appendix, we detail the learning of 0 = 

[(JmCrpmc,dc,Lo,ri,a^] for cGP, uGP, and MCGP. We con¬ 
sider nodes know (T„ and Lq, therefore they are not estimated 
as part of the learning process. Let the remaining set of 
hyperparameters be 0 = [(Tproc, dc, a^] and p . 


uGP 

In this case, the path-loss exponent is estimated as 

0= (hu hu)”^hJ(y-l'^Lo), (48) 

where = -10 [Ex^[logio(||xi||),..., Ex„ [logio(||xiv||)]^. 
Once again removing the mean from the measurements, we 
obtain Tu = y — l^Lo “ fj. The hyperparameters 0 are 
estimated by minimizing the modified negative log-likelihood 
function 

0 = argmin{- log(p(Yu|U, 0)} 

0 

= argnnn|log |Ku| -f Yuj. (49) 

Again, = 1 /N . is the variance of the process. 

As a result, tf^ becomes — d^roc due to 

this l{6) is now only a function of dc- We solve (ED and find 
dc by an exhaustive grid search. 

The learning process can be simplified for uGP: since Uproc 
only captures kernel mismatch irrespective of the location un¬ 
certainty and path loss, the value of dproc can be obtained off¬ 
line with noise-free training locations by performing learning 
as in the case of cGP, but with a covariance function of the 
form (Ell for p = 2. This approach gives an advantage to cGP 
and thus makes the comparison between uGP and cGP more 
fair for all values of A > 0. 


cGP 

Based on Section [Bll we can write the received measure¬ 
ments y with their corresponding training locations X in 
matrix form as 

y = -b he p -I- + n, (45) 

where IP = [T'(xi),... T'(xAr)]"^, n = [m,..., and 

he = -10[logio(||xi||),...,logio(||x7v||)]'^. Assuming the 
measurements are uncorrelated, then the least squares estimate 
of the path-loss exponent can be computed as 

0 = (h^ he) hj (y - I'^Lo) • (46) 

Once the path-loss exponent is estimated, the mean com¬ 
ponent of the received measurements can be subtracted as, 
Tc = y — “ he fj. Then, Tc becomes a zero-mean 

Gaussian process. Now the likelihood function dD becomes 
l{9) = p(Yc|X, 0) = A/’(Tc;0, K). The hyperparameters 0 
are estimated by minimizing negative logarithm of l{6) 

0 = argmin{-log(p(Ye|X,0)} 

0 

= argmm{log|K|-bTjK-iYc}. (47) 

We calculate the variance of the process Tc as = 

l/A^^)^i[Yc]?. The variance of the process should be cap¬ 
tured by the hyperparameters Cproc, and tr^. We define 
'^proc = o'Tot ~ ~ '^1'’ ^ result I (9) becomes a function 

of only dc and We solve (ED and find dc and <7^ by an 
exhaustive grid search. Once dc and are found, then dproc 
can be calculated as = b'Tot ~ ^n~ d%- 


MCGP 

It is no longer feasible to estimate p first and subtract 
to make the process zero mean, because of summation in 
the Monte Carlo integration (ED- Therefore, we optimize 
(fT3l i with respect to the hyperparameters p and 0 using 
fminsearch function of Matlab. 
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